Introduction

Column

Marginal Effects

After running a regression model of some sort, the common way of interpreting the relationships between predictors and the outcome is by interpreting the regression coefficients. In many situations these interpretations are straightforward, however, in some settings the interpretations can be tricky, or the coefficients can simply not be interpreted.

These complications arise most commonly when a model involves an interaction or moderation. In these sort of models, interpretation of the coefficients requires special care due to the relationship between the various interacted predictors, and the set of information obtainable from the coefficients is often limited without resorting to significant algebra.

Marginal effects offer an improvement over simple coefficient interpretation. Marginal means allow you to estimate the average response under a set of conditions; e.g. the average response for each racial group, or the average response when age and weight are at certain values.

Marginal slopes estimate the slope between a predictor and the outcome when other variables are at some specific value; e.g. the average slope on age within each gender, or the average slope on age when weight is at certain values.

In either case, in addition to estimating these marginal effects, we can easily test hypotheses of equality between them via pairwise comparisons. For example, is the average response different by ethnicity, or is the slope on age different between genders.

If you are familiar with interpreting regression coefficients, and specifically interactions, you may have some idea of how to start addressing all the above examples. However, to complete answer these research questions would require more than simply a table of regression coefficients. With marginal effects, one additional set of results can address all these questions.

Finally, marginal effect estimation is a step towards creating informative visualizations of relationships such as interaction plots.

Data and Libraries

Code

Stata

The webuse command can load the directly.

. webuse margex
(Artificial data for margins)

. summarize, sep(0)

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           y |      3,000    69.73357    21.53986          0      146.3
     outcome |      3,000    .1696667    .3754023          0          1
         sex |      3,000    .5006667    .5000829          0          1
       group |      3,000       1.828    .7732714          1          3
         age |      3,000      39.799    11.54174         20         60
    distance |      3,000    58.58566    181.3987        .25   748.8927
         ycn |      3,000    74.47263    22.01264          0      153.8
          yc |      3,000    .2413333    .4279633          0          1
   treatment |      3,000    .5006667    .5000829          0          1
    agegroup |      3,000    2.263667    .8316913          1          3
         arm |      3,000    1.970667    .7899457          1          3

R

library(haven)
library(emmeans)
library(interactions)
m <- haven::read_dta("http://www.stata-press.com/data/r15/margex.dta")
m <- data.frame(m)
m$sex <- as_factor(m$sex)
m$group <- as_factor(m$group)
head(m)
      y outcome    sex group age distance   ycn yc treatment agegroup arm
1 102.6       1 female     1  55    11.75  97.3  1         1        3   1
2  77.2       0 female     1  60    30.75  78.6  0         1        3   1
3  80.5       0 female     1  27    14.25  54.5  0         1        1   1
4  82.5       1 female     1  60    29.25  98.4  1         1        3   1
5  71.6       1 female     1  52    19.00 105.3  1         1        3   1
6  83.2       1 female     1  49     2.50  89.7  0         1        3   1

Categorical Variables

Code

Stata

. regress y i.group

      Source |       SS           df       MS      Number of obs   =     3,000
-------------+----------------------------------   F(2, 2997)      =     15.48
       Model |  14229.0349         2  7114.51743   Prob > F        =    0.0000
    Residual |  1377203.97     2,997  459.527518   R-squared       =    0.0102
-------------+----------------------------------   Adj R-squared   =    0.0096
       Total |  1391433.01     2,999  463.965657   Root MSE        =    21.437

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       group |
          2  |   .4084991   .8912269     0.46   0.647    -1.338979    2.155977
          3  |   5.373138   1.027651     5.23   0.000     3.358165     7.38811
             |
       _cons |   68.35805   .6190791   110.42   0.000     67.14419    69.57191
------------------------------------------------------------------------------

. margins group

Adjusted predictions                            Number of obs     =      3,000
Model VCE    : OLS

Expression   : Linear prediction, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       group |
          1  |   68.35805   .6190791   110.42   0.000     67.14419    69.57191
          2  |   68.76655   .6411134   107.26   0.000     67.50948    70.02361
          3  |   73.73119   .8202484    89.89   0.000     72.12288    75.33949
------------------------------------------------------------------------------

R

summary(mod1 <- lm(y ~ group, data = m))

Call:
lm(formula = y ~ group, data = m)

Residuals:
    Min      1Q  Median      3Q     Max 
-73.531 -14.758   0.101  14.536  72.569 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  68.3580     0.6191 110.419  < 2e-16 ***
group2        0.4085     0.8912   0.458    0.647    
group3        5.3731     1.0277   5.229 1.83e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 21.44 on 2997 degrees of freedom
Multiple R-squared:  0.01023,   Adjusted R-squared:  0.009566 
F-statistic: 15.48 on 2 and 2997 DF,  p-value: 2.045e-07
emmeans::emmeans(mod1, ~ group)
 group emmean    SE   df lower.CL upper.CL
 1       68.4 0.619 2997     67.1     69.6
 2       68.8 0.641 2997     67.5     70.0
 3       73.7 0.820 2997     72.1     75.3

Confidence level used: 0.95 

Math

Assume a linear regression set up with a single categorical variable, \(G\), with three groups. We fit

\[ E(Y|G) = \beta_0 + \beta_1g_2 + \beta_2g_3, \]

where \(g_2\) and \(g_3\) are dummy variables representing membership in groups 2 and 3 respectively (\(g_{2i} = 1\) if observation \(i\) has \(G = 2\), and equal to 0 otherwise.) Since \(g_1\) is not found in the model, it is the reference category.

Therefore, \(\beta_0\) represents the average response among the reference category \(G = 1\). \(\beta_1\) represents the difference in the average response between groups \(G = 1\) and \(G = 2\). Therefore, \(\beta_0 + \beta_1\) is the average response in group \(G = 2\). A similar argument can be made about \(\beta_2\) and group 3.

Interactions with Categorical Variables

Code

Stata

. regress y i.sex##i.group

      Source |       SS           df       MS      Number of obs   =     3,000
-------------+----------------------------------   F(5, 2994)      =     92.91
       Model |  186898.199         5  37379.6398   Prob > F        =    0.0000
    Residual |  1204534.81     2,994  402.316235   R-squared       =    0.1343
-------------+----------------------------------   Adj R-squared   =    0.1329
       Total |  1391433.01     2,999  463.965657   Root MSE        =    20.058

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sex |
     female  |   21.62507   1.509999    14.32   0.000     18.66433    24.58581
             |
       group |
          2  |   11.37444   1.573314     7.23   0.000     8.289552    14.45932
          3  |    21.6188   1.588487    13.61   0.000     18.50416    24.73344
             |
   sex#group |
   female#2  |  -4.851582   1.942744    -2.50   0.013     -8.66083   -1.042333
   female#3  |  -6.084875   3.004638    -2.03   0.043    -11.97624   -.1935115
             |
       _cons |    50.6107   1.367932    37.00   0.000     47.92852    53.29288
------------------------------------------------------------------------------

. margins sex

Predictive margins                              Number of obs     =      3,000
Model VCE    : OLS

Expression   : Linear prediction, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sex |
       male  |   59.77145   .6454388    92.61   0.000      58.5059      61.037
     female  |   78.20318   .7105463   110.06   0.000     76.80997    79.59639
------------------------------------------------------------------------------

. margins sex#group

Adjusted predictions                            Number of obs     =      3,000
Model VCE    : OLS

Expression   : Linear prediction, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   sex#group |
     male#1  |    50.6107   1.367932    37.00   0.000     47.92852    53.29288
     male#2  |   61.98514   .7772248    79.75   0.000     60.46119    63.50908
     male#3  |    72.2295   .8074975    89.45   0.000     70.64619     73.8128
   female#1  |   72.23577     .63942   112.97   0.000     70.98203    73.48952
   female#2  |   78.75863   .9434406    83.48   0.000     76.90877    80.60849
   female#3  |    87.7697   2.468947    35.55   0.000     82.92869     92.6107
------------------------------------------------------------------------------

R

summary(mod1 <- lm(y ~ sex*group, data = m))

Call:
lm(formula = y ~ sex * group, data = m)

Residuals:
    Min      1Q  Median      3Q     Max 
-72.029 -13.285   0.464  13.741  67.964 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)        50.611      1.368  36.998  < 2e-16 ***
sexfemale          21.625      1.510  14.321  < 2e-16 ***
group2             11.374      1.573   7.230 6.12e-13 ***
group3             21.619      1.588  13.610  < 2e-16 ***
sexfemale:group2   -4.852      1.943  -2.497   0.0126 *  
sexfemale:group3   -6.085      3.005  -2.025   0.0429 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 20.06 on 2994 degrees of freedom
Multiple R-squared:  0.1343,    Adjusted R-squared:  0.1329 
F-statistic: 92.91 on 5 and 2994 DF,  p-value: < 2.2e-16
emmeans::emmeans(mod1, ~ group)
 group emmean    SE   df lower.CL upper.CL
 1       61.4 0.755 2994     59.9     62.9
 2       70.4 0.611 2994     69.2     71.6
 3       80.0 1.299 2994     77.5     82.5

Results are averaged over the levels of: sex 
Confidence level used: 0.95 
emmeans::emmeans(mod1, ~ group + sex)
 group sex    emmean    SE   df lower.CL upper.CL
 1     male     50.6 1.368 2994     47.9     53.3
 2     male     62.0 0.777 2994     60.5     63.5
 3     male     72.2 0.807 2994     70.6     73.8
 1     female   72.2 0.639 2994     71.0     73.5
 2     female   78.8 0.943 2994     76.9     80.6
 3     female   87.8 2.469 2994     82.9     92.6

Confidence level used: 0.95 

Math

Consider the simplest case, with two binary variables \(Z\) and \(K\). We fit the model

\[ E(Y|Z,K) = \beta_0 + \beta_1Z + \beta_2K + \beta_3ZK, \]

where \(ZK = 0\) only if both \(Z = 1\) and \(K = 1\).

This is functionally equivalent to defining a new variable \(L\),

\[ L = \begin{cases} 0, & K = 0 \& Z = 0 \\ 1, & K = 0 \& Z = 1 \\ 2, & K = 1 \& Z = 0 \\ 3, & K = 1 \& Z = 1, \end{cases} \]

and fitting the model

\[ E(Y|L) = \beta_0 + \beta_1l_1 + \beta_2l_2 + \beta_3l_3. \]

Categorical and Continuous Variables

Code

Stata

. regress y i.sex c.age

      Source |       SS           df       MS      Number of obs   =     3,000
-------------+----------------------------------   F(2, 2997)      =    209.88
       Model |  170938.657         2  85469.3283   Prob > F        =    0.0000
    Residual |  1220494.35     2,997  407.238688   R-squared       =    0.1229
-------------+----------------------------------   Adj R-squared   =    0.1223
       Total |  1391433.01     2,999  463.965657   Root MSE        =     20.18

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sex |
     female  |   14.03271   .7777377    18.04   0.000     12.50775    15.55766
         age |  -.5043666    .033698   -14.97   0.000    -.5704402   -.4382931
       _cons |   82.78115   1.323613    62.54   0.000     80.18586    85.37643
------------------------------------------------------------------------------

. margins sex, at(age = (30 40))

Adjusted predictions                            Number of obs     =      3,000
Model VCE    : OLS

Expression   : Linear prediction, predict()

1._at        : age             =          30

2._at        : age             =          40

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     _at#sex |
     1#male  |   67.65015   .5604889   120.70   0.000     66.55116    68.74913
   1#female  |   81.68285   .6911128   118.19   0.000     80.32775    83.03796
     2#male  |   62.60648    .537682   116.44   0.000     61.55222    63.66074
   2#female  |   76.63919    .533784   143.58   0.000     75.59257    77.68581
------------------------------------------------------------------------------

R

For each unique level of the continuous variable, a seperate emmeans call is needed apparently. If anyone knows a way around this, please let me know.

summary(mod1 <- lm(y ~ sex + age, data = m))

Call:
lm(formula = y ~ sex + age, data = m)

Residuals:
   Min     1Q Median     3Q    Max 
-71.99 -13.88  -0.15  13.76  75.28 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  82.7811     1.3236   62.54   <2e-16 ***
sexfemale    14.0327     0.7777   18.04   <2e-16 ***
age          -0.5044     0.0337  -14.97   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 20.18 on 2997 degrees of freedom
Multiple R-squared:  0.1229,    Adjusted R-squared:  0.1223 
F-statistic: 209.9 on 2 and 2997 DF,  p-value: < 2.2e-16
emmeans::emmeans(mod1, ~ sex, at = list(age = 30))
 sex    emmean    SE   df lower.CL upper.CL
 male     67.7 0.560 2997     66.6     68.7
 female   81.7 0.691 2997     80.3     83.0

Confidence level used: 0.95 
emmeans::emmeans(mod1, ~ sex, at = list(age = 40))
 sex    emmean    SE   df lower.CL upper.CL
 male     62.6 0.538 2997     61.6     63.7
 female   76.6 0.534 2997     75.6     77.7

Confidence level used: 0.95 

Categorical and Continuous Interactions

Code

Stata

. regress y i.sex##c.age

      Source |       SS           df       MS      Number of obs   =     3,000
-------------+----------------------------------   F(3, 2996)      =    139.91
       Model |  170983.675         3  56994.5583   Prob > F        =    0.0000
    Residual |  1220449.33     2,996   407.35959   R-squared       =    0.1229
-------------+----------------------------------   Adj R-squared   =    0.1220
       Total |  1391433.01     2,999  463.965657   Root MSE        =    20.183

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sex |
     female  |   14.92308   2.789012     5.35   0.000     9.454508    20.39165
         age |  -.4929608   .0480944   -10.25   0.000    -.5872622   -.3986595
             |
   sex#c.age |
     female  |  -.0224116   .0674167    -0.33   0.740    -.1545994    .1097762
             |
       _cons |   82.36936   1.812958    45.43   0.000      78.8146    85.92413
------------------------------------------------------------------------------

. margins sex, at(age = (30 40))

Adjusted predictions                            Number of obs     =      3,000
Model VCE    : OLS

Expression   : Linear prediction, predict()

1._at        : age             =          30

2._at        : age             =          40

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     _at#sex |
     1#male  |   67.58054   .5984013   112.94   0.000     66.40722    68.75386
   1#female  |   81.83127   .8228617    99.45   0.000     80.21784     83.4447
     2#male  |   62.65093   .5541361   113.06   0.000     61.56441    63.73746
   2#female  |   76.67755   .5461908   140.39   0.000      75.6066    77.74849
------------------------------------------------------------------------------

R

summary(mod1 <- lm(y ~ sex*age, data = m))

Call:
lm(formula = y ~ sex * age, data = m)

Residuals:
    Min      1Q  Median      3Q     Max 
-71.817 -13.848  -0.116  13.758  75.263 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   82.36936    1.81296  45.434  < 2e-16 ***
sexfemale     14.92308    2.78901   5.351 9.42e-08 ***
age           -0.49296    0.04809 -10.250  < 2e-16 ***
sexfemale:age -0.02241    0.06742  -0.332     0.74    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 20.18 on 2996 degrees of freedom
Multiple R-squared:  0.1229,    Adjusted R-squared:  0.122 
F-statistic: 139.9 on 3 and 2996 DF,  p-value: < 2.2e-16
emmeans::emmeans(mod1, ~ sex, at = list(age = 30))
 sex    emmean    SE   df lower.CL upper.CL
 male     67.6 0.598 2996     66.4     68.8
 female   81.8 0.823 2996     80.2     83.4

Confidence level used: 0.95 
emmeans::emmeans(mod1, ~ sex, at = list(age = 40))
 sex    emmean    SE   df lower.CL upper.CL
 male     62.7 0.554 2996     61.6     63.7
 female   76.7 0.546 2996     75.6     77.7

Confidence level used: 0.95 

No Interactions

Code

Stata

. regress y i.sex c.age

      Source |       SS           df       MS      Number of obs   =     3,000
-------------+----------------------------------   F(2, 2997)      =    209.88
       Model |  170938.657         2  85469.3283   Prob > F        =    0.0000
    Residual |  1220494.35     2,997  407.238688   R-squared       =    0.1229
-------------+----------------------------------   Adj R-squared   =    0.1223
       Total |  1391433.01     2,999  463.965657   Root MSE        =     20.18

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sex |
     female  |   14.03271   .7777377    18.04   0.000     12.50775    15.55766
         age |  -.5043666    .033698   -14.97   0.000    -.5704402   -.4382931
       _cons |   82.78115   1.323613    62.54   0.000     80.18586    85.37643
------------------------------------------------------------------------------

. margins, dydx(age)

Average marginal effects                        Number of obs     =      3,000
Model VCE    : OLS

Expression   : Linear prediction, predict()
dy/dx w.r.t. : age

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -.5043666    .033698   -14.97   0.000    -.5704402   -.4382931
------------------------------------------------------------------------------

R

summary(mod1 <- lm(y ~ sex + age, data = m))

Call:
lm(formula = y ~ sex + age, data = m)

Residuals:
   Min     1Q Median     3Q    Max 
-71.99 -13.88  -0.15  13.76  75.28 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  82.7811     1.3236   62.54   <2e-16 ***
sexfemale    14.0327     0.7777   18.04   <2e-16 ***
age          -0.5044     0.0337  -14.97   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 20.18 on 2997 degrees of freedom
Multiple R-squared:  0.1229,    Adjusted R-squared:  0.1223 
F-statistic: 209.9 on 2 and 2997 DF,  p-value: < 2.2e-16
emmeans::emtrends(mod1, ~ 1, var = "age")
 1       age.trend     SE   df lower.CL upper.CL
 overall    -0.504 0.0337 2997    -0.57   -0.438

Results are averaged over the levels of: sex 
Confidence level used: 0.95 

Interactions

Code

Stata

. regress y i.sex##c.age

      Source |       SS           df       MS      Number of obs   =     3,000
-------------+----------------------------------   F(3, 2996)      =    139.91
       Model |  170983.675         3  56994.5583   Prob > F        =    0.0000
    Residual |  1220449.33     2,996   407.35959   R-squared       =    0.1229
-------------+----------------------------------   Adj R-squared   =    0.1220
       Total |  1391433.01     2,999  463.965657   Root MSE        =    20.183

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sex |
     female  |   14.92308   2.789012     5.35   0.000     9.454508    20.39165
         age |  -.4929608   .0480944   -10.25   0.000    -.5872622   -.3986595
             |
   sex#c.age |
     female  |  -.0224116   .0674167    -0.33   0.740    -.1545994    .1097762
             |
       _cons |   82.36936   1.812958    45.43   0.000      78.8146    85.92413
------------------------------------------------------------------------------

. margins sex, dydx(age)

Average marginal effects                        Number of obs     =      3,000
Model VCE    : OLS

Expression   : Linear prediction, predict()
dy/dx w.r.t. : age

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
age          |
         sex |
       male  |  -.4929608   .0480944   -10.25   0.000    -.5872622   -.3986595
     female  |  -.5153724   .0472435   -10.91   0.000    -.6080054   -.4227395
------------------------------------------------------------------------------

R

summary(mod1 <- lm(y ~ sex*age, data = m))

Call:
lm(formula = y ~ sex * age, data = m)

Residuals:
    Min      1Q  Median      3Q     Max 
-71.817 -13.848  -0.116  13.758  75.263 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   82.36936    1.81296  45.434  < 2e-16 ***
sexfemale     14.92308    2.78901   5.351 9.42e-08 ***
age           -0.49296    0.04809 -10.250  < 2e-16 ***
sexfemale:age -0.02241    0.06742  -0.332     0.74    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 20.18 on 2996 degrees of freedom
Multiple R-squared:  0.1229,    Adjusted R-squared:  0.122 
F-statistic: 139.9 on 3 and 2996 DF,  p-value: < 2.2e-16
emmeans::emtrends(mod1, ~ sex, var = "age")
 sex    age.trend     SE   df lower.CL upper.CL
 male      -0.493 0.0481 2996   -0.587   -0.399
 female    -0.515 0.0472 2996   -0.608   -0.423

Confidence level used: 0.95 

Categorical Variables

Code

Stata

. regress y i.group

      Source |       SS           df       MS      Number of obs   =     3,000
-------------+----------------------------------   F(2, 2997)      =     15.48
       Model |  14229.0349         2  7114.51743   Prob > F        =    0.0000
    Residual |  1377203.97     2,997  459.527518   R-squared       =    0.0102
-------------+----------------------------------   Adj R-squared   =    0.0096
       Total |  1391433.01     2,999  463.965657   Root MSE        =    21.437

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       group |
          2  |   .4084991   .8912269     0.46   0.647    -1.338979    2.155977
          3  |   5.373138   1.027651     5.23   0.000     3.358165     7.38811
             |
       _cons |   68.35805   .6190791   110.42   0.000     67.14419    69.57191
------------------------------------------------------------------------------

. margins group, pwcompare(pv)

Pairwise comparisons of adjusted predictions    Number of obs     =      3,000
Model VCE    : OLS

Expression   : Linear prediction, predict()

-----------------------------------------------------
             |            Delta-method    Unadjusted
             |   Contrast   Std. Err.      t    P>|t|
-------------+---------------------------------------
       group |
     2 vs 1  |   .4084991   .8912269     0.46   0.647
     3 vs 1  |   5.373138   1.027651     5.23   0.000
     3 vs 2  |   4.964638   1.041073     4.77   0.000
-----------------------------------------------------

R

Note that pairs is a generic, meaning that emmeans::pairs is an invalid call. Despite this, the “emmeans” package is required to call the below.

summary(mod1 <- lm(y ~ group, data = m))

Call:
lm(formula = y ~ group, data = m)

Residuals:
    Min      1Q  Median      3Q     Max 
-73.531 -14.758   0.101  14.536  72.569 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  68.3580     0.6191 110.419  < 2e-16 ***
group2        0.4085     0.8912   0.458    0.647    
group3        5.3731     1.0277   5.229 1.83e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 21.44 on 2997 degrees of freedom
Multiple R-squared:  0.01023,   Adjusted R-squared:  0.009566 
F-statistic: 15.48 on 2 and 2997 DF,  p-value: 2.045e-07
pairs(emmeans::emmeans(mod1, ~ group), adjust = "none")
 contrast estimate    SE   df t.ratio p.value
 1 - 2      -0.408 0.891 2997 -0.458  0.6467 
 1 - 3      -5.373 1.028 2997 -5.229  <.0001 
 2 - 3      -4.965 1.041 2997 -4.769  <.0001 

The adjust = "none" argument skips any multiple comparisons correction. This is done to obtain the same results as the regression coefficients. If you’d prefer a more ANOVA post-hoc approach, there are other options for adjust, see ?emmeans::contrast for details.

Categorical Interactions

Column

Stata

. regress y i.group##i.sex

      Source |       SS           df       MS      Number of obs   =     3,000
-------------+----------------------------------   F(5, 2994)      =     92.91
       Model |  186898.199         5  37379.6398   Prob > F        =    0.0000
    Residual |  1204534.81     2,994  402.316235   R-squared       =    0.1343
-------------+----------------------------------   Adj R-squared   =    0.1329
       Total |  1391433.01     2,999  463.965657   Root MSE        =    20.058

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       group |
          2  |   11.37444   1.573314     7.23   0.000     8.289552    14.45932
          3  |    21.6188   1.588487    13.61   0.000     18.50416    24.73344
             |
         sex |
     female  |   21.62507   1.509999    14.32   0.000     18.66433    24.58581
             |
   group#sex |
   2#female  |  -4.851582   1.942744    -2.50   0.013     -8.66083   -1.042333
   3#female  |  -6.084875   3.004638    -2.03   0.043    -11.97624   -.1935115
             |
       _cons |    50.6107   1.367932    37.00   0.000     47.92852    53.29288
------------------------------------------------------------------------------

. margins group#sex, pwcompare(pv)

Pairwise comparisons of adjusted predictions    Number of obs     =      3,000
Model VCE    : OLS

Expression   : Linear prediction, predict()

------------------------------------------------------------------
                          |            Delta-method    Unadjusted
                          |   Contrast   Std. Err.      t    P>|t|
--------------------------+---------------------------------------
                group#sex |
  (1#female) vs (1#male)  |   21.62507   1.509999    14.32   0.000
    (2#male) vs (1#male)  |   11.37444   1.573314     7.23   0.000
  (2#female) vs (1#male)  |   28.14793   1.661722    16.94   0.000
    (3#male) vs (1#male)  |    21.6188   1.588487    13.61   0.000
  (3#female) vs (1#male)  |     37.159   2.822577    13.16   0.000
  (2#male) vs (1#female)  |  -10.25064   1.006447   -10.18   0.000
(2#female) vs (1#female)  |   6.522856    1.13971     5.72   0.000
  (3#male) vs (1#female)  |  -.0062748   1.030005    -0.01   0.995
(3#female) vs (1#female)  |   15.53392   2.550404     6.09   0.000
  (2#female) vs (2#male)  |   16.77349   1.222358    13.72   0.000
    (3#male) vs (2#male)  |   10.24436   1.120772     9.14   0.000
  (3#female) vs (2#male)  |   25.78456   2.588393     9.96   0.000
  (3#male) vs (2#female)  |  -6.529131   1.241826    -5.26   0.000
(3#female) vs (2#female)  |   9.011069   2.643063     3.41   0.001
  (3#female) vs (3#male)  |    15.5402   2.597644     5.98   0.000
------------------------------------------------------------------

. margins sex@group, contrast(pv nowald)

Contrasts of adjusted predictions               Number of obs     =      3,000
Model VCE    : OLS

Expression   : Linear prediction, predict()

------------------------------------------------------------
                    |            Delta-method
                    |   Contrast   Std. Err.      t    P>|t|
--------------------+---------------------------------------
          sex@group |
(female vs base) 1  |   21.62507   1.509999    14.32   0.000
(female vs base) 2  |   16.77349   1.222358    13.72   0.000
(female vs base) 3  |    15.5402   2.597644     5.98   0.000
------------------------------------------------------------

R

To do: All pairwise comparisons

summary(mod1 <- lm(y ~ group*sex, data = m))

Call:
lm(formula = y ~ group * sex, data = m)

Residuals:
    Min      1Q  Median      3Q     Max 
-72.029 -13.285   0.464  13.741  67.964 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)        50.611      1.368  36.998  < 2e-16 ***
group2             11.374      1.573   7.230 6.12e-13 ***
group3             21.619      1.588  13.610  < 2e-16 ***
sexfemale          21.625      1.510  14.321  < 2e-16 ***
group2:sexfemale   -4.852      1.943  -2.497   0.0126 *  
group3:sexfemale   -6.085      3.005  -2.025   0.0429 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 20.06 on 2994 degrees of freedom
Multiple R-squared:  0.1343,    Adjusted R-squared:  0.1329 
F-statistic: 92.91 on 5 and 2994 DF,  p-value: < 2.2e-16
emmeans::contrast(emmeans::emmeans(mod1, ~ sex | group), "pairwise")
group = 1:
 contrast      estimate   SE   df t.ratio p.value
 male - female    -21.6 1.51 2994 -14.321 <.0001 

group = 2:
 contrast      estimate   SE   df t.ratio p.value
 male - female    -16.8 1.22 2994 -13.722 <.0001 

group = 3:
 contrast      estimate   SE   df t.ratio p.value
 male - female    -15.5 2.60 2994  -5.982 <.0001 

Plotting

Code

Stata

. regress y i.sex##c.age

      Source |       SS           df       MS      Number of obs   =     3,000
-------------+----------------------------------   F(3, 2996)      =    139.91
       Model |  170983.675         3  56994.5583   Prob > F        =    0.0000
    Residual |  1220449.33     2,996   407.35959   R-squared       =    0.1229
-------------+----------------------------------   Adj R-squared   =    0.1220
       Total |  1391433.01     2,999  463.965657   Root MSE        =    20.183

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sex |
     female  |   14.92308   2.789012     5.35   0.000     9.454508    20.39165
         age |  -.4929608   .0480944   -10.25   0.000    -.5872622   -.3986595
             |
   sex#c.age |
     female  |  -.0224116   .0674167    -0.33   0.740    -.1545994    .1097762
             |
       _cons |   82.36936   1.812958    45.43   0.000      78.8146    85.92413
------------------------------------------------------------------------------

. margins sex, at(age = (20(10)60))

Adjusted predictions                            Number of obs     =      3,000
Model VCE    : OLS

Expression   : Linear prediction, predict()

1._at        : age             =          20

2._at        : age             =          30

3._at        : age             =          40

4._at        : age             =          50

5._at        : age             =          60

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     _at#sex |
     1#male  |   72.51015   .9336568    77.66   0.000     70.67947    74.34082
   1#female  |     86.985    1.22567    70.97   0.000     84.58175    89.38824
     2#male  |   67.58054   .5984013   112.94   0.000     66.40722    68.75386
   2#female  |   81.83127   .8228617    99.45   0.000     80.21784     83.4447
     3#male  |   62.65093   .5541361   113.06   0.000     61.56441    63.73746
   3#female  |   76.67755   .5461908   140.39   0.000      75.6066    77.74849
     4#male  |   57.72132   .8477402    68.09   0.000     56.05911    59.38353
   4#female  |   71.52382    .604927   118.24   0.000     70.33771    72.70994
     5#male  |   52.79171   1.262091    41.83   0.000     50.31706    55.26637
   5#female  |    66.3701   .9380502    70.75   0.000     64.53081    68.20939
------------------------------------------------------------------------------

. marginsplot, recastci(rarea)

  Variables that uniquely identify margins: age sex

R

summary(mod1 <- lm(y ~ sex*age, data = m))

Call:
lm(formula = y ~ sex * age, data = m)

Residuals:
    Min      1Q  Median      3Q     Max 
-71.817 -13.848  -0.116  13.758  75.263 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   82.36936    1.81296  45.434  < 2e-16 ***
sexfemale     14.92308    2.78901   5.351 9.42e-08 ***
age           -0.49296    0.04809 -10.250  < 2e-16 ***
sexfemale:age -0.02241    0.06742  -0.332     0.74    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 20.18 on 2996 degrees of freedom
Multiple R-squared:  0.1229,    Adjusted R-squared:  0.122 
F-statistic: 139.9 on 3 and 2996 DF,  p-value: < 2.2e-16
interactions::interact_plot(mod1, pred = age, modx = sex, interval = TRUE)